Parsing Calorie Count Web Site for some calories using Python


In [1]:
from lxml import html
from lxml import etree
import requests
page = requests.get('https://www.caloriecount.com/exercise')
tree = html.fromstring(page.content)

In [2]:
ExerciseTypeLinks=tree.xpath('//*[@id="content"]/div[1]/div[2]/div[1]/div[2]/ul/li/a/@href')

In [3]:
### Get the exercise bundle links
ExerciseTypeLinks


Out[3]:
['/activities-bicycling-ac1',
 '/activities-conditioning-exercise-ac2',
 '/activities-dancing-ac3',
 '/activities-fishing-hunting-ac4',
 '/activities-home-activities-ac5',
 '/activities-home-repair-ac6',
 '/activities-inactivity-ac7',
 '/activities-lawn-garden-ac8',
 '/activities-miscellaneous-ac9',
 '/activities-music-playing-ac10',
 '/activities-occupation-ac11',
 '/activities-religious-activities-ac20',
 '/activities-running-ac12',
 '/activities-self-care-hygiene-ac13',
 '/activities-sexual-activity-ac14',
 '/activities-sports-ac15',
 '/activities-transportation-ac16',
 '/activities-volunteer-activities-ac21',
 '/activities-walking-ac17',
 '/activities-water-activities-ac18',
 '/activities-winter-activities-ac19']

In [4]:
fl=ExerciseTypeLinks[:]
for i in range(len(fl)):
    fl[i]="https://www.caloriecount.com"+fl[i]
fl


Out[4]:
['https://www.caloriecount.com/activities-bicycling-ac1',
 'https://www.caloriecount.com/activities-conditioning-exercise-ac2',
 'https://www.caloriecount.com/activities-dancing-ac3',
 'https://www.caloriecount.com/activities-fishing-hunting-ac4',
 'https://www.caloriecount.com/activities-home-activities-ac5',
 'https://www.caloriecount.com/activities-home-repair-ac6',
 'https://www.caloriecount.com/activities-inactivity-ac7',
 'https://www.caloriecount.com/activities-lawn-garden-ac8',
 'https://www.caloriecount.com/activities-miscellaneous-ac9',
 'https://www.caloriecount.com/activities-music-playing-ac10',
 'https://www.caloriecount.com/activities-occupation-ac11',
 'https://www.caloriecount.com/activities-religious-activities-ac20',
 'https://www.caloriecount.com/activities-running-ac12',
 'https://www.caloriecount.com/activities-self-care-hygiene-ac13',
 'https://www.caloriecount.com/activities-sexual-activity-ac14',
 'https://www.caloriecount.com/activities-sports-ac15',
 'https://www.caloriecount.com/activities-transportation-ac16',
 'https://www.caloriecount.com/activities-volunteer-activities-ac21',
 'https://www.caloriecount.com/activities-walking-ac17',
 'https://www.caloriecount.com/activities-water-activities-ac18',
 'https://www.caloriecount.com/activities-winter-activities-ac19']

In [5]:
page = requests.get(fl[2])
    tree = html.fromstring(page.content)
    exlinks=tree.xpath('//*[@id="content"]/div[1]/div[3]/ul/li/a/@href')
    exlinks


Out[5]:
['https://www.caloriecount.com/calories-burned-aerobic-a36',
 'https://www.caloriecount.com/calories-burned-aerobic-a40',
 'https://www.caloriecount.com/calories-burned-aerobic-a39',
 'https://www.caloriecount.com/calories-burned-aerobic-a38',
 'https://www.caloriecount.com/calories-burned-aerobic-a37',
 'https://www.caloriecount.com/calories-burned-anishinaabe-jingle-dancing-other-a45',
 'https://www.caloriecount.com/calories-burned-ballet-modern-a35',
 'https://www.caloriecount.com/calories-burned-ballroom-a42',
 'https://www.caloriecount.com/calories-burned-ballroom-a43',
 'https://www.caloriecount.com/calories-burned-ballroom-a44',
 'https://www.caloriecount.com/calories-burned-general-a41']

In [ ]:
fo = open("ActivityCalories_from_caloriecount_dot_com.txt", "w")
for i in range(len(exlinks)):
    page = requests.get(exlinks[i])
    tree = html.fromstring(page.content)
    print(tree.xpath('//*[@id="activityitem"]/h1/text()')[0])
    print(tree.xpath('//*[@id="activityitem"]/span/text()'))
    print(tree.xpath('//*[@id="activityitem"]/text()[6]')[0])
    print(tree.xpath('//*[@id="activityitem"]/text()[3]')[0])
Calories burned with Aerobic ['455'] Assuming a body weight of: 70 kg General Calories burned with Aerobic ['490'] Assuming a body weight of: 70 kg High Impact Calories burned with Aerobic ['350'] Assuming a body weight of: 70 kg Low Impact Calories burned with Aerobic ['700'] Assuming a body weight of: 70 kg Step, With 10 - 12 Inch Step Calories burned with Aerobic ['595'] Assuming a body weight of: 70 kg Step, With 6 - 8 Inch Step Calories burned with Anishinaabe Jingle Dancing or Other Traditional American Indian Dancing ['385']

In [ ]: